**MAPLD 2008** 

Annapolis, Maryland



### Hardening-by-design techniques using residue number system in SRAM-based FPGAs: an experiment on a FIR filter

S. Pontarelli, G.C. Cardarilli, A. Salsano Università di Roma "Tor Vergata", Roma, Italy

S. Gerardin, <u>A. Manuzzato</u>, A. Paccagnella DEI - Università di Padova, Padova, Italy



**RREACT -** Reliability and Radiation Effects on Advanced CMOS Technologies





### Introduction

### Traditional hardening approach

- basic idea and drawbacks

### Residue Number System

- definition and properties
- error detection and correction capabilities

### • Case study: hardening a FIR filter

- hardened circuit
- irradiation experiments

### Conclusions

RREACT



### Introduction

**THE PROBLEM**: the use of SRAM-based FPGAs in harsh **radiation** environment is limited by the **susceptibility** of such devices to radiation effects

Single Event Upsets affecting the configuration memory can alter the implemented circuit functionality

**"THE SOLUTION**": if we want to use commercial SRAM-based FPGAs in radiation environments, **hardening-by-design** techniques are mandatory to preserve the correct circuit functionality





## Traditional Approach: TMR

A widely used approach to improve the system reliability is the **Triple Modular Redundancy** (TMR) technique



<u>The idea</u>: all the logic is tripled and a majority voter chooses the circuit outputs, masking errors to the outside world



This approach is very <u>area expensive</u>!

- □ area increases more than 3 times
- **power** consumption increases (tripled logic, tripled clock distribution...)
- □ needs triple **I/Os** (plus voltage references)
- □ degradation in **timing performance**

Problems affecting this hardening technique:

Multiple Bit Upsets (MBUs) can alter simultaneously two redundant domains [*Quinn et al. TNS Dec. 2007*]
 In the configuration memory there are bits controlling multiple resources -> single point of failure (due to the FPGA architecture) [*Sterpone et al.* TNS Aug. 2006]





RREACT

## **Residue Number System**

### A Residue Number System is defined

by a cat of ralativaly prima integars

In this work we present a hardening technique based on the **Residue Number System** to implement FIR filters with error detection and correction capabilities











## RNS background

RRNS moduli : 
$$\{m_1, m_2, \dots, m_k, m_{k+1}, \dots, m_{k+r}\}$$
  
normal moduli redundant moduli

Minimum to correct 1 error : **3** moduli + **2** redundant moduli

The normal moduli define the system dynamic range M:

$$M = \prod_{i=1}^{k} m_i \implies x \in [0, M - 1]$$

While the product of all the moduli defines the total range  $M_T$ 

$$M_{T} = \prod_{i=1}^{k+r} m_{i}$$

$$[0, \dots, M-1, M, \dots, M_{T}-1]$$

$$[egitimate range]$$

$$(M_{T} = \prod_{i=1}^{k+r} m_{i}$$

$$(M_{T} = \prod_{$$



## Error detection and correction

Important properties stand for the *m<sub>i</sub>-projection*, defined as:

$$X_{mi} = X \mod\left(\frac{M_T}{m_i}\right) = CRT(x_{m_1}, x_{m_2}, ..., x_{m_i-1}, x_{m_{i+1}}, ..., x_{m_{k+r}})$$

residue vector representation of X with the residue digit *i* deleted

Error detection and localization

# If an error affects the module *i* then

projection *i* falls in the *legitimate* range and the others in the *illegitimate* range

Error correction

RREACT

The correct output value can be obtained performing the reverse conversion of the  $m_i$  projection ( $X_{mi}$ )

A. Manuzzato – MAPLD 2008



## **RRNS FIR filter**





**RREACT** 

## TMR Hardened RRNS FIR filter

<u>Possible solution</u>: protection of the conversion blocks (CRTs) and the "choose legitimate" block with TMR



1 CRT per modulo X 3 + 3 "choose legitimate" blocks Area occupation !!

A. Manuzzato - MAPLD 2008

## Our RRNS hardened implementation

### Our solution: create a block "Legitimate voter"



A. Manuzzato - MAPLD 2008

RREACT



### Area comparison: TMR-RRNS vs. Our RRNS implementation

| Filter | Number of<br>taps | range | TMR-<br>RRNS<br>Overhead<br>[# of LUTs] | Our<br>Implement<br>ation<br>overhead<br>[# of LUTs] | %   |
|--------|-------------------|-------|-----------------------------------------|------------------------------------------------------|-----|
| FIR1   | 16                | 20    | 7407                                    | 2931                                                 | 40% |
| FIR2   | 64                | 22    | 9774                                    | 3763                                                 | 39% |
| FIR3   | 256               | 24    | 17037                                   | 5780                                                 | 34% |
| FIR4   | 16                | 28    | 17127                                   | 5927                                                 | 35% |
| FIR5   | 64                | 30    | 17196                                   | 5951                                                 | 35% |
| FIR6   | 256               | 32    | 19242                                   | 7044                                                 | 37% |





## Irradiation Experiment Setup

**DUT**: hardened FIR filter implemented on a Xilinx Spartan-3 FPGA (XC3S200) irradiated with **alpha particles** (<sup>241</sup>Am source)

### Control board:

- provides the stimuli to the DUT filter
- compares the circuit
- behavior to the expected ones
- reads back/configures the DUT configuration memory (JTAG)

DUT filter I/Os plus <u>additional debug</u> <u>signals</u> to monitor the "Legitimate Voters" behavior and to localize the induced errors



## Experimental Results & Discussion

□ The DUT was irradiated until an illegitimate or two different legitimate values were detected at the "legitimate voter" inputs. After each event the DUT was fully reconfigured.

□ We collected thousand of events and observed **no errors** at the filter outputs after the "legitimate voter", which **worked properly in all the situations** 

❑ We encountered some "strange events": in some occasions the legitimate voter received erroneous input data, even if there were <u>no errors in the configuration memory</u>. The device recovered after minutes (after several full reconfigurations), possibly due to **half latch** problems [Graham et al. TNS Dec. 2003]



# **Experimental Results & Discussion**

The added debug signals allow us to localize the induced faults

| Error Locations  | Events [%] |  |  |
|------------------|------------|--|--|
| FIR module       | 27 %       |  |  |
| CRT block        | 59 %       |  |  |
| Legitimate voter | 14 %       |  |  |

Faults classification:

- **FIR module**: error in the binary to RNS converter or in the i-th FIR module
- CRT block: error in the RNS to binary conversion block
- Legitimate voter: error in the "Legitimate voter" block



We presented a hardening-by-design technique based on the Redundant Residue Number System well suited for hardening FIR filters and DSPs, featuring an innovative "legitimate voter" with the following features:

□ fault tolerance with respect to upsets in the configuration memory, as demonstrated by radiation tests

I lower FPGA resource usage as compared to a conventional hardening approach based on the triplication of the output conversion (CRT) and "choose legitimate" blocks

